A0-A113  S66  PRINCETON  UNIV  NJ  DEPT  OF  STATISTICS 

MOMENTS  Of  PARTICLE  S12E  DISTRIBUTIONS  UNDER 
NOV  81  A  F  SlE6ELf  6  SU8IHARA 
UNCLASSIFIED  TR-215 


F/6  6/3 

SEOUENTIAL  BREAKA8-*ETC(U) 
DAA629-79-C-0205 
NL 


ARO-16669.20-M 


AD  A113S66 


UNCLASSIFIt.D _ 

ECUHITV  Cl  A^SIFIC  '  )N  0>  This  P  AOF  C<F>>»n  0««  Rnt>r*4) 

REPORT  DOCUMENTATION  PAGE 


2.  GOVT  ACCCSSION  MO. 


HC^OAT  NUMOCn 

16669. 20-M 

rtTLE  ftna  StibtlOm) 


Moments  of  Particle  Size  Distributions  Under 
Sequential  Breakage  with  Applications  to  Species 
Abundance 
author^*; 

Andrew  F.  Siegel 
George  Sugihara 


READ  INSTRUCTIONS 

_ BEKORE  COMPLETING  KORM 

1.  RECIPIENT'S  catalog  NUMaCR 


S  TVPE  OF  REPORT  *  PERIOO  COVERED 


Techni cal 


6  PERFOHMINC  ORG  report  HUMBER 


•  contract  or  grant  NUMBCR«j 


0AAG29  79  C  0205 


PERFORMING  organization  NAME  AND  ADDRESS 


Princeton  University 
Princeton,  NJ  085^0 


10  program  ELEMENT,  project,  task 

AREA  A  WORK  UNIT  NUMBERS 


CONTROLLING  OFFICE  NAME  ANO  ADDRESS 

U.  S.  Army  Research  Office 

Post  Office  Box  12211 

Research  Triangle  Park,  NC  27709 


MONITORING  AGENCY  NAME  A  AODRESSTN  dlllmtmtt  Iram  Conlnlllnf  Ollte*) 


M.  report  date 


Nov  81 


I)  HUMBER  OF  PACES 


IS.  security  Class.  (oI  Ihlm  report) 


Uncinssi  fieri 


<Sa.  DECLASSIFICATION/ DOWNGRADING 
SCHEOULE 


lA.  OlSTRieuTIOM  STATEMENT  Co/  ttilt  Rtpoti) 


Approved  for  public  release;  distribution  unlii 


n.  DISTRIBUTION  STATEMENT  (ot  (A*  aAalracI  anIaraK  In  SlocA  20,  II  dlllan 


dtic 

^ECTE 


It.  supplcmcntary  Nores 


The  view,  opinions,  und/or  findings  contained  in  this  report  are  those  of  the 
author(s)  and  should  not  be  construed  as  an  official  Department  of  the  Army 
position,  Hiolicy,  or  decision,  unless  so  designated  by  other  documentation. 


If.  KEY  WOPOS  (Cor%tinum  on  rovoroo  <1  nmc99»my  ontf  fOonfiiy  ox  oioca  nuoiOor; 


broken  stick  model 
lognormal  distribution 
ecology 

niche  hierarchy  model 


random  binary  trees 


un  CMTIOM  OF  I  HOW  A»  IS  OMOCETE 


_ unclassified  binary  trees. 

A»riiw«TY  n  aAAiriraTHTH  r,»  Tla.WAr.r  /wi»».~tV. 


MOMENTS  OF  PARTICLE  SIZE  DISTRIBUTIONS 
UNDER  SEQUENTIAL  BREAKAGE 
WITH  APPLICATIONS  TO  SPECIES  ABUNDANCE 


by 


Andrew  F.  Siegel 
and 

George  Sugihara 


Technical  Report  No.  215,  Series  2 
Department  of  Statistics 
Princeton  University 

November  1981 


Accenr 1 on 
NT IS 

bt:c  - '  1 

Ju  -  ^ ^  c 


lor 


By _ _ 

Vn  -  ■  r 

Avail’ll;  xl  i  '",y  Codes 
jAvcii  aud/or 
Special 


Diet 


£ 


This  work  was  supported  in  part  by  U.S.  Army  Research  Office  Grant 
Number  DAAG29-79-C-0205  and  a  Princeton  University  Prize  Fellowship. 


82  04  26  029 


MOMENTS  OF  PARTICLE  SIZE  DISTRIBUTIONS 
UNDER  SEQUENTIAL  BREAKAGE 
WITH  APPLICATIONS  TO  SPECIES  ABUNDANCE 


ANDREW  F.  SIEGEL* 
and 

GEORGE  SUGIHARA** 
Princeton  University 


ABSTRACT 

The  sequential  broken  stick  model  has  appeared  in  numerous  contexts, 
including  biology,  physics,  engineering  and  geology.  Kolmogorov  showed 
that  under  appropriate  conditions,  sequential  breakage  processes  often 
yield  a  lognormal  distribution  of  particle  sizes.  Of  particular  interest 
to  ecologists  is  the  observed  variance  of  the  logarithms  of  the  sizes, 
which  characterizes  the  evenness  of  an  assemblage  of  species.  We  derive 
the  first  two  moments  for  the  logarithms  of  the  sizes  in  terms  of  the 
underlying  distribution  used  to  determine  the  successive  breakages.  In 
particular,  for  a  process  yielding  n  pieces,  the  expected  sample  vari¬ 
ance  behaves  asymptotically  as  n  log(n)  .  These  results  also  yield  a 
new  identity  for  moments  of  path  lengths  in  random  binary  trees. 

Key  words:  BROKEN  STICK  MODEL;  LOGNORMAL  DISTRIBUTION;  ECOLOGY;  NICHE 
HIERARCHY  MODEL;  RANDOM  BINARY  TREES 


*  Postal  address:  Department  of  Statistics,  Princeton  University, 
Princeton,  N.J.,  08544,  U.S.A. 

**  Postal  address:  Department  of  Biology,  Princeton  University,  Princeton 
N.J.,  08544,  U.S.A. 


1.  INTRODUCTION 


When  a  distributional  pattern  is  generated  by  an  unobservable  process, 
insight  into  the  mechanism  of  genesis  can  sometimes  be  Inferred  with  the 
aid  of  a  suitable  model.  Our  focus  here  will  be  on  lognormal  distribu¬ 
tions,  with  the  aim  of  studying  a  class  of  processes  that  give  rise  to 
empirical  families  of  lognormal  curves.  The  importance  of  this  pattern 
rests  largely  on  its  ubiquity  and  the  broad  spectrum  of  contexts  in  which 
it  appears. 

In  engineering  and  geology  lognormal  distributions  have  been  used 
to  describe  quantities  produced  by  natural  and  mechanical  processes,  such 
as  frequencies  of  particle  sizes  and  life  lengths  of  materials  and 
machines  before  failure  (Epstein,  1947;  Herdan,  1953).  In  economics  and 
sociology  these  distributions  have  been  fit  to  data  on  incomes  (Gibrat, 
1931;  Davies,  1946;  Kapteyan,  1916)  and  numbers  of  people  per  occupation 
(Clark,  1964) .  Applications  in  biology  Include  characterizing  data  on 
body  sizes  (Yuan,  1933;  Camp,  1938;  Cramer,  1946)  and  species  abundance 
(Preston,  1962;  Altchlson  and  Brown,  1968;  Patrick,  1968;  Bulmer,  1974; 

May,  1975;  Plelou,  1975;  Suglhara,  1980).  Brown  and  Sanders  (1981)  have 
shown  that  the  lognormal  distribution  arises  in  a  large  variety  of  classi¬ 
fication  procedures. 

Some  of  these  physical  and  biological  contexts,  where  the  natural 
method  of  genesis  involves  repeated  breakages,  produce  special  families 
of  lognormal  distributions.  Kolmogorov  (1941)  has  shown  that  when  the 
frequency  of  breakage  is  Independent  of  the  size  of  each  particle,  the 


-2- 


asynptotlc  distribution  of  particle  sizes  should  tend  to  be  lognormal. 

Of  interest  here  is  that  the  mean  and  variance  will  depend  on  the  number 
of  breakages  applied.  These  parameters  will  therefore  be  coupled  to  the 
number  of  particles  generated,  producing  families  of  lognormal  distribu¬ 
tions  in  which  the  variance  of  log  sizes  will  Increase  with  the  application 
of  additional  breakage  events. 

When  such  coupling  is  actually  observed,  it  may  suggest  sequential 
breakage  as  a  possible  method  of  genesis.  This  argument  was  used,  for 
example,  in  a  recent  model  of  species  abundance  (Sugihara,  1980)  in  which 
sequential  binary  breakages  of  niche  space  was  proposed  to  explain  a 
particular  coupling  of  parameters  observed  in  the  lognormal  species  abun¬ 
dance  distribution  (Preston,  1962).  Investigating  sequential  breakage 
mechanisms  may  be  useful  not  only  for  clarifying  the  predictions  of  this 
species  abundance  model,  but  also  in  general,  for  understanding  the 
genesis  of  empirical  families  of  lognormal  curves  having  coupled  parameters. 

Monte  Carlo  estimates  have  been  available  (Sugihara,  1980)  for  the 
relationship  between  the  expected  variance  and  the  number  of  particles 
(or  species)  in  some  special  cases  of  repeated  binary  breakage.  Our  aim 
here  Is  to  provide  exact  and  a8yiiq>totlc  formulae  for  this  relationship. 

In  addition  to  simplifying  computation,  these  results  will  also  yield 
further  Insight  into  the  nature  of  breakage  processes.  In  particular, 
we  will  show  how  the  expected  mean  and  variance  of  the  logarithmic  sizes 
can  be  expressed  in  terms  of  auxiliary  moments  of  the  distribution  of 
breakage  applied  at  each  step.  Underlying  these  results  is  a  somewhat 
surprising  Identity  Involving  cross-moments  of  path  lengths  in  random 
binary  trees. 


-3- 


2.  EXPECTED  MEANS  AND  VARIANCES 

Begin  with  a  stick  of  unit  length  and  a  breakage  distribution  F 
that  is  syrmnetrlc  on  (0,1)  .  This  is  stage  1;  at  stage  n  the  stick 
will  be  broken  into  n  pieces.  To  go  from  stage  n  to  stage  n+1  , 
first  choose  a  piece  at  random  (uniformly  without  regard  to  size,  so  that 
each  piece  has  probability  1/n  of  being  chosen)  and  then  break  it  in 
two  according  to  a  proportion  chosen  independently  from  F  .  Because  the 
piece  to  be  broken  is  chosen  randomly  in  this  way,  we  lose  no  generality 
by  requiring  F  to  be  symmetric,  in  order  to  simplify  the  mathematical 
treatment. 

If  W  is  an  observation  from  F  ,  define  moments  y  ■  ElIlog(W)3° 

n 

and  V  -  EClog(W)log(l-W)3  .  We  will  assume  that  and  are 

finite.  At  stage  n  ,  let  the  pieces  have  sizes  X,  , . . . ,X  _  in  some 

in  nn 

random  ordering  so  that  these  are  exchangeable  (but  not  Independent)  ran¬ 
dom  variables.  The  logarithms  of  the  sizes,  ■  log(Xj^^)  ,  are  of 

n 

interest  in  many  applications,  as  are  the  sample  mean  U  >  (  £  U.  )/n 

2  n  2  "  i-1 

and  the  sample  variance  S  •  (  L  (D.  -  U  )  )/(n-l)  .  Note  that  each 

n  in  n 

X^  is,  marginally,  the  product  of  a  random  number  of  Independent  propor¬ 
tions  chosen  from  F  ,  and  each  is  the  sum  of  the  corresponding 

logarithms. 


Theorem  1.  The  mean  and  variance  of  the  logarithm,  ,  of  the 

size  of  a  single  random  piece  at  stage  n  are 

E<D^)  -  i 


(1) 


(2) 


n  1  2  °  1 

Var(U.  )  -  2u,  Z  f  Z 

^  ^  k-2 


Proof.  By  exchangeability ,  It  will  suffice  to  compute  E(U.  )  . 

”  in 

Condition  according  to  whether  this  piece  was  or  was  not  involved  In  the 

most  recent  breakage,  events  with  probabilities  2/n  and  (n-2)/n 

respectively.  Because  the  conditional  distributions  of  the  lengths  are 

0.  ,  +  log(U)  and  U,  respectively,  where  W  has  distribution  F 

i^n^x  i^n^i 

and  Is  Independent  of  U.  .  ,  we  obtain  the  recurrence 

i  g  n^i 


''“in’  ■  ;  “l 


(3) 


lidiose  solution  with  Initial  condition  E(Uj^  1^  "  ®  is  (1).  For  the 

variance,  Var(U,  )  ■  E(U^  )  -  Ce(U,  )]^,  condition  as  before  for  each 
In  In  In  ’ 

term  of  this  difference,  to  establish 


*  n  '‘I'O’l.n-l’  *  S  "2 


(4) 


and 


Subtracting  (S)  from  (4)  and  simplifying,  we  obtain 
Var(U,  )  -  Var(D 


In 


l,n-l>  ^  f  ^2  -  7 


(6) 


whose  solution  with  Initial  condition  Var(U^^)  *0  Is  (2).  Q 

In  a  real  situation,  care  must  be  taken  with  regard  to  the  variance 

2 

tern.  Due  to  the  dependence  among  ,  the  sample  variance 


-5- 


should  not  be  compared  to  (2) ,  which  is  the  expected  sample  variance  for 
an  independent  sample  with  the  same  marginals.  Instead,  the  following 
quantities  should  be  used. 


Theorem  2.  The  expected  sample  mean  and  variance  at  stage  n  are 


E(n  )  -  2y  I  ^ 

“  -L  k.2 


(7) 


E(S^)  -  {2(1  +  -^)  E  i  -  2}  y,  -  V 
n  n-1  K  2 


(8) 


Proof.  Equation  (7)  follows  by  linearity  from  (1) .  For  (8) ,  expand 
and  use  exchangeability  to  obtain 


E(sf)  -  E(D^  -  • 


(9) 


Next,  condition  according  to  the  four  possibilities  of  involvement  of 


and  in  the  most  recent  breakage  (neither,  only,  only. 


2n 


In 


'2n 


or  both).  For  example,  with  probability  2/(n(n-l))  they  were  both 


Involved  in  the  most  recent  breakage,  and  has  the  condi¬ 


tional  distribution 


CU 


+  log(W)r  -  +  log(W)XUj^^_j  +  log(l-W)]} 


(10) 


where  W  is  a  random  variable  with  distribution  F  and  is  independent 


of  *  Combining  this  with  the  results  from  the  other  three 


J 


—6— 


possibilities,  then  simplifying,  we  find  with  some  effort  that 


2v 


n  n(n-l)  n-1  n  n(n-l) 


(11) 


With  some  patience,  it  can  be  shown  by  induction  that  (8)  is  the  solution 

2 

to  the  recurrence  (11)  with  initial  condition  8(82)  ■  li2  “  ° 


The  means  (1)  and  (7)  are  identical  because  the  expectation  operator 
is  linear  even  under  dependence.  However,  from  (2)  and  (8)  we  see  that  the 
lack  of  Independence  has  modified  the  variance.  These  differences  can  be 
studied  in  detail  by  examining  an  asymptotic  expansion  of  each  expression. 


Theorem  3. 


E(U^)  -  E(U^)  -  U,{21og(n)  -  .8456+^-- ^5-+— i^+0(-^)} 

6n  60n  n” 


(12) 


2  ^*"1  ^2 
Var(Uj^)  -  2V2log(n)  -  (2.5797vJ+ .8456^2)  +  ■  -  — 


12m^  y2  2^1 

.2  ,3  4 


6n 


Mo  2m 

+  0(^) 

3n-'  60n^  15n^  n 


(13) 


E(S^)  -  2M2lo8(n)  -(V  +  2.8456M2)  +  4M2  - " 


.6911m. 


.1422m,  .02447m,  .007804m,  .008863m,  , 

+  — 2^ - - r-^+ — 

n  n  n  n  n 


(14) 


Proof.  These  follow  from  two  standard  asymptotic  expansions.  Equation 
(12)  depends  on  the  expansion  of  the  harmonic  series  (e.g.  Knuth,  1973, 

Vol.  1,  p.74): 


-7- 


E  i  -  log(n)  +  Y  +  ^  +  0(4-) 

k-1  ^  2n  120n^  n® 


where  Y=* 5772156649  is  Euler's  constant.  Equation  (13)  also  uses 
(e.g.  p.61  of  Hansen,  1975) 


k-1  k^  “  “  2n^  6n^  30n^  rJ 


For  (14) ,  use  (15)  in  (8)  and  multiply  the  nonlogarithmic  part  by  the 
expansion  of  2/(n~l)  as  a  power  series  in  1/n  .  Combine  terms  to  find 


E(S^)  -  {21og(n)  +  (2y-4)  +  4  +  !£1=1  + 

n  n-1  a  ^^2 


^2^^  240i:^^  240]tlM}^  v  +  ^ 

6n^  60n  60n  ^  n 


tdilch  evaluates  to  (14). 


Although  the  leading  terms  in  the  variances  (13)  and  (14)  are  identi¬ 
cal,  the  differences  in  their  second  terms  cannot  be  neglected  even  for 
moderately  large  n  due  to  the  slowly  increasing  behavior  of  log(n)  . 

For  example,  when  F  places  mass  1/2  at  1/4  and  at  3/4,  even  with 

2 

n  as  large  as  50,  we  have  Var(U.  )•  5.3  while  E(S  )■  4.9  • 

m  n 


3.  AN  IDENTITY  FOR  RANDOM  BINARY  TREES 

Consider  the  class  of  random  binary  trees  with  n  endpoints  generated 
recursively  by  bifurcating  an  endpoint  chosen  uniformly  at  random  from  a 
tree  with  n-1  endpoints.  These  trees  are  responsible  for  part  of  the 
randomness  of  the  sequential  breakage  model  (the  other  component  can  be 
thought  of  as  entering  through  the  distribution  F  ) .  These  trees  are 
related  to  random  binary  search  trees  used  in  computer  science  (Knuth,  1973,  Vol. 
3,  p. 423-471).  For  a  tree  with  n  endpoints,  let  N^^^  and  N^^  denote 
the  distances  (in  numbers  of  edges)  from  each  of  two  randomly  chosen  end¬ 
points  to  their  nearest  common  ancestor,  as  Illustrated  in  the  figure. 

Although  moments  involving  N^^  and  N2^  generally  increase  with  n  , 
there  is  an  expression  for  which  this  dependence  cancels  out. 

Theorem  4.  Regardless  of  the  value  of  n  , 

*  Vzu  -  ^ 

Proof .  Proceed  by  Induction  on  n  ,  conditioning  on  the  four  events 
describing  which  of  N^^^^  and  N2^  were  involved  in  the  most  recent 
bifurcation.  This  is  similar  to  the  proof  of  Theorem  2.  n 


ACKNOWLEDGEMENTS 


This  work  was  supported  in  part  by  U.S.  Army  Research  Office  Grant 
Number  DAAG29-79-C-0205  and  a  Princeton  University  Prize  Fellowship. 

We  are  grateful  to  Andrew  Odlyzko,  Stephen  L.  Teig,  Vincent  DellaPietra, 
and  Clifford  Hurvich  for  helpful  conversations. 


REFERENCES 


AITCHISON,  J.  and  BROWN,  J.A.C.  (2nd  ed.,  1968).  The  Lognormal  Distribution, 
with  Special  Reference  to  Its  Uses  In  Economics.  Cambridge  University 


Press. 

BROWN,  G.  and  SANDERS,  J.W.  (1981).  Lognormal  genesis.  Journal  of  Applied 
Probability  18.  542-547. 

BUUIER,  M.G.  (1974).  On  fitting  the  Poisson  lognormal  distribution  to 
species-abundance  data.  Biometrics  30.  101-110. 

CAMP,  B.H.  (1938).  Notes  on  the  distribution  of  the  geometric  mean.  Ann. 
Math.  Statist.  9.  221-226. 

CLARK,  P.J.  (1964).  On  the  number  of  individuals  per  occupation  in  a  human 
society.  Ecology  45,  367-372. 

CRAMER,  H.  (1946).  Mathematical  Methods  of  Statistics.  Princeton  Mathe¬ 
matical  Series,  No.  9,  Princeton  University  Press. 

DAVIES,  G.R.  (1946).  Pricing  and  price  levels.  Econometrica,  14.  219-226. 

EPSTEIN,  B.  (1947).  The  mathematical  description  of  certain  breakage 

mechanisms  leading  to  the  logarlthmlco-normal  distribution.  J.  Frank¬ 
lin  Inst.  224.  471-477. 

GIBRAT,  R.  (1931).  Les  inegalites  economiques.  Paris:  Llbralre  du  Recuell 
Slrey. 

HANSEN,  E.R.  (1975).  A  Table  of  Series  and  Products.  New  Jersey:  Pren¬ 
tice-Hall. 

HERDAN,  G.  (1953).  Small  Particle  Statistics.  Amsterdam:  Elsevier. 

KAPETYAN,  J.C.  (1916).  Skew  frequency  curves  in  biology  and  statistics. 

Rec.  Trav.  bot.  neere  13.  105. 

KNDTH,  D.E.  (1973).  The  Art  of  Computer  Programming;  Volume  1,  Fundamental 
Algorithms,  2nd  edition;  Volisne  3,  Sorting  and  Searching.  Reading, 
Massachusetts:  Addison-Wesley. 

KOLMOGOROV,  A.N.  (1941).  Ober  das  logrlthmisch  normals  Verteilungsgesetz 
der  Dlmensionen  der  Tellchen  bel  Zerstiickelung .  C.R.  Acad.  Sci. 
U.R.S.S.  31.  99-101. 

MAY,  R.M.  (1975).  Patterns  of  species  abundance  and  diversity.  Pages 

81-120  in  M.L.  Cody  and  J.M.  Diamond,  eds.  Ecology  and  Evolution  of 
Communities.  Harvard  University  Press,  Cambridge,  Mass. 

PATRICK,  R.  (1968).  The  structure  of  diatom  communities  in  similar 
ecological  conditions.  Am.  Nat.  102.  173-183. 

PIELOU,  E.C.  (1975).  Ecological  Diversity.  Wiley,  New  York. 


-12- 


PRESTON,  F.W.  (1962).  The  canonical  distribution  of  commonness  and 
rarity;  Part  I.  Ecology  43.  185-215. 

SUGIHARA,  G.  (1980).  Minimal  commimity  structure:  an  explanation  of 
species  abundance  patterns.  Am.  Natur.  116.  770-787. 

YUAN,  P.T.  (1933).  On  the  logarithmic  frequency  distribution  and  the 

semi-logarlthmlc  correlation  surface.  Ann.  Math.  Statist.  30-74. 


